Fast Algorithms For String Matching With And Without Swaps

نویسنده

  • Kimmo Fredriksson
چکیده

Given a text string T of lenght n and a pattern string P of lenght m over some alphabet Σ, we want to find the occurrences of P ′ from T such that P ′ can be derived from P by set of local swaps, i.e. transpositions of two adjacent characters, each character swapping at most once. We give several simple but fast algorithms for the problem. The first algorithm is based on Boyer–Moore–Horspool approach. The second algorithm uses a nondeterministic finite automaton that is simulated using a shift–or type method. We improve the shift–or to take only time O(n/ log|Σ| sd(m + 1)/we), where s ≥ |Σ| is the space usage of the algorithm, and w is the lenght of the machine word. This algorithm is sublinear for small patterns and alphabets, and is asymptotically the fastest bit–parallel simulation of sufficiently simple nondeterministic finite automata. Finally, we show how bit–parallel suffix automaton can be used to solve the problem in optimal average time O(n log m/m), while being only O(ndm/we) in the worst case. The algorithms are very simple to implement, and experimental results show that they are very fast on natural language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Swapped Matching

Let a text string T of n symbols and a pattern string P of m symbols from alphabet be given. A swapped version P 0 of P is a length m string derived from P by a series of local swaps, (i.e. p 0 ` p `+1 and p 0 `+1 p `) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of nding all locations i of T for which there exists a swapped versio...

متن کامل

Efficient Special Cases of Pattern Matching with Swaps

Let a text string T of n symbols and a pattern string P of m symbols from alphabet be given. A swapped version T 0 of T is a length n string derived from T by a series of local swaps, (i.e. t 0 ` t `+1 and t 0 `+1 t `) where each element can participate in no more than one swap. The Pattern Matching with Swaps problem is that of nding all locations i for which there exists a swapped version T 0...

متن کامل

Efficient Algorithms for Approximate String Matching with Swaps (Extended Abstract)

Most research on the edit distance problem and the k-differences problem considered the set of edit operations consisting of changes, insertions, and deletions. In this paper we include the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and we present an O(t min(m, n))-time algorithm for the extended edit distance problem, where t is the edit...

متن کامل

Pattern Matching with Swaps

1 A preliminary version of this paper appeared in FOCS 97. Let a text string T of n symbols and a pattern string P of m symbols from alphabet be given. A swapped version T of T is a length n string derived from T by a series of local swaps (i.e., t ← t +1 and t +1 ← t), where each element can participate in no more than one swap. The pattern matching with swaps problem is that of finding all lo...

متن کامل

Performance Evaluation of Local Detectors in the Presence of Noise for Multi-Sensor Remote Sensing Image Matching

Automatic, efficient, accurate, and stable image matching is one of the most critical issues in remote sensing, photogrammetry, and machine vision. In recent decades, various algorithms have been proposed based on the feature-based framework, which concentrates on detecting and describing local features. Understanding the characteristics of different matching algorithms in various applications ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000